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Abstract 


Here we present an ergodic theorem which adapts the setting of iterated function 
systems (IFS) for the ergodic transport theory. We will adapt a Theorem by J. Elton 
[2] to the classical thermodynamical formalism and also to ergodic transport. The main 


point here is to use an iterated function system to understand the main ergodic properties 
of transport plans (which solves a variational principle of ergodic transport as described 
in previous papers). To illustrate the technique, we first explore the thermodynamical 
formalism case, and only after that, we proceed to the ergodic transport case. We do that 
in the following way: after briefly recalling Elton’s theorem, we discuss how it can be used 
to characterise Gibbs measures (also called equilibrium measures) for expanding maps, in 
the usual setting of thermodynamical formalism, where one is interested in finding the 
measure that maximizes the pressure of a given potential. Such characterisation will be 
done by constructing a stochastic process (defined by a IFS) whose empirical measure 
converges to the Gibbs measure, in the sense that the mean of any test function evaluated 
in the outcomes of this stochastic process converges to the integral of such test function 
with respect to the Gibbs measure. In this way we present a stochastic algorithm that 
compute integrals of functions. After this initial application, we turn our attention to 
ergodic transport: if we have two sets X and Q, a measure ^ on X and a dynamics T on 
fl, we can consider the set of probability measures on X x whose projections on the 
second coordinate are T-invariant, while the projections in the first coordinate are given 
by fj.. Such measures are called transport plans. An optimal transport plan is a plan 
that maximizes the integral of a certain potential function. We also call Gibbs plan (or 
equilibrium transport plan) any transport plan that maximizes a pressure functional that 
is defined (as in the classical case) by a potential function plus an entropy term. Gibbs 
plan can converge, under certain conditions, to optimal plans (this is called the zero- 
temperature limit). We show how Elton's theorem can be used to characterise Gibbs 
plans in ergodic transport theory also by dehning a stochastic process (defined by a IFS) 
whose empirical measures converges to the Gibbs plan, as in the classical thermodynamical 
formalism case. We provide examples and show explicitly calculations in the case where 
X has two elements and the cost function depends on the two first coordinates of Q. 

1 Introduction 

Ergodic transport can be motivated by an interesting problem in ergodic theory: given a 
dynamical system T : O —> Id, where (fl, d) is a compact metric space, A is the Borel sigma- 
algebra on n and T is a continuous map, and given a fixed probability measure ^ on A (which 
does not need to have any relation to the dynamics T), one wants to obtain the T-invariant 
measure that minimizes the Wasserstein-2 distance to /r. 

If V is also a probability measure on A, we denote by the set of probability measures 
on n X n whose projections on each coordinates are, respectively, /r and v. We have that the 
Wasserstein-2 distance between this measures is given by 



It is known that this distance metrizes the weak-convergence topology on the space of proba¬ 
bility measures on A (see [9] for details). 

Now, let t be the set of probability measures on fl x 11 whose projections on the first 
coordinate are equal to /r and the projections on the second coordinate are T-invariant. If one 
solves the constrained optimization problem 
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one obtains (by projecting the solution on the second coordinate) the T-invariant measure that 
is closest to /i (according to the Wasserstein metric). A measure tt that solves de minimization 
problem above is called an optimal transport plan. 

Ergodic transport was studied in a more general setting in . Such paper consider proba¬ 
bility measures on A x fl, where A is a set that can be different from il. If ^ is a probability 
measure on A, and T : If —17 is a dynamical system defined on 17, we denote by the set 
of probability measures on A x 17 that project (in the first coordinate) on /r, while projecting, 
in the second coordinate, on any invariant measure for T : 17 —17. See section [3] for the pre¬ 
cise definition of Note that different measures in Xl^^r can project in different invariant 

measures for T. In [3, one is interested in obtaining a measure that attains 

sup / A{x,y)dTT{x,y), 

where A : A x 17 —>■ R is called a potential function. If A is a singleton (has only one point), the 
maximization above is the same found in ergodic optimization problems, and for this reason 
we can see ergodic transport, in some sense, as a generalization of ergodic optimization. Note 
that, if we compare the problem above with the one concerning the measure that minimizes 
Wasserstein measure, we see that we are interested in maximize, rather than minimize, the 
integral of a function. This difference does not pose any conceptual differences in the theory: 
if one wants to minimize, he just have to change the sign of the potential. 

After optimal plans were introduced, in [5] the entropy of plans was defined by means of 
what can be called the Jacobian of the measure. If we define a functional by adding the integral 
of the potential with respect to some plan, with the entropy of this plan, and try to maximize 
this functional, we have what is called a pressure problem (in analogy to classical thermody¬ 
namical formalism). One can think that we are considering Thermodynamic Formalism where 
the potential is random due to the choice of a fixed probability /i. A variational principle 
was obtained in [5] (see section [3] for more details on the results of [5] that are needed here). 
The plan that satisfies the variational principle is called an equilibrium plan, and is related to 
the fixed point of a Ruelle-Perron-Frobenius-type operator (transfer operator). Moreover, as 
a relation between equilibrium plans and optimal plans, in it is proved that optimal plans 
can be obtained as weak limits of equilibrium plans, if we multiply the cost by a constant 
P, and send /? —>■ oo. This is called the zero temperature limit (also in analogy to classical 
thermodynamical formalism - see my 

In section[3]we review the basic results in ergodic transport, and some details of the results 
stated above. 

Section m brings the main objective of this paper, which is to characterize, via an Ergodic 
theorem, the equilibrium plan, that solves the pressure problem introduced in [5]. This The¬ 
orem plays the same role in Ergodic Transport as the Ergodic Theorem in classical Ergodic 
Theory. 

Such characterisation will be done by constructing a stochastic process whose empirical 
measure converges to the Gibbs measure, in the sense that the mean of any test function 
evaluated in the outcomes of this stochastic process converges to the integral of the test function 
with respect to the equilibrium plan. We provide examples and also show explicitly calculations 
in the case where A has two elements and the cost function depends on the two first coordinates 
of 17. 

The main tool in getting this caracterization is a result due to Elton, concerning an ergodic 
theorem for Markov processes defined by iterated function systems (IFS), which is stated below 
in an adapted form that is appropriate for our purposes: 

Theorem 1 (Elton - 1987) Let Z be a compact metric space, and ti \ Z ^ Z, for 1 < i < d 
be a finite number of Lipschitz contractive maps. Let pi : Z ^ (0,1] be Lipschitz continuous 
weight functions, and suppose '^i=iPi{z) = 1 for all z G Z. Let v be a probability measure on 
Z that satisfies, for any Borel-measurable set B, 



Then, for any zq G Z, if we define by recurrence 

Zk+i=Ti{zk) with probability pfizk), (2) 
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we have that, for any continuous function f : Z ^ 'K, almost surely, 

N-l 

k=0 

We will soon explain exactly what almost sure convergence in ([3]) means. Before that, some 
considerations are necessary: a finite set of maps defined on Z, together with the transition 
probabilities pi : Z ^ (Oj 1] defines what is called an iterated function system (IFS). In this 
paper we will call the maps Ti as Elton maps. 

If we define the transition Kernel 

P{z,B)= pi{z), 

ri(z)6B 

© means that n is the invariant measure for the Markov process {zfc}fceN defined on Z by @. 

Elton’s theorem implies that the law of large numbers holds for the process f{zk)- 

Note that the initial zq G Z can be any point in Z: the time average on ([3]) does not depend 
on zq. 

Let us explain the meaning of almost sure convergence in we begin by denoting 
ft — the Bernoulli set of d symbols, with the cr-algebra generated by the cylin¬ 

ders. According to [^, for any z G Z there exists (i) a probability onfl = {I, ...,d}^, which 
is given in cylinders by 

Pz{ii,i2, ik) = Ph {z)pi^ (Til (z))pi3(ri, (n, (z)))...K, (...(n, (z))...)), 

(ii) a set Git such that Pz{Gz) = I. Now, [3] proves that, if the address sequence {ii,i2, ■•■) 
belongs to Gz, then the sequence dehned recursively by 

\zo = z, 

|zfc-i-i =Ti,^+Azk), for fc > 0, 

is such that ([3]) holds for any continuous function f : Z ^ R. 

In J. Elton's result the orbits go backward and not forward as in the classical Birkhoff 
theorem. 

As a simple application of Elton's result, let Z = {I, 2, ..., d} and n = i. Then Zk defined 
by (I3|) is the usual Markov Chain associated to the transition matrix Pij = Pj(i). If Pij > 0 
for all i,j, then Elton's theorem implies, as a particular case, the Law of Large numbers for 
the Markov chain Zk- 

Elton’s result was proved in a slightly more general form, where the maps Ti 's only need 
to be contractive ’on the average’, and Z need not to be compact (see details in 0 ). The 
invariant measure v that satisfies o was proved to be unique by some of the authors cited in 
[3]. More recently a very simple proof of Theorem [T] is provided in [3], in the case the pi 's are 
constant and Z is compact. 

Before applying Elton's result to characterize Equilibrium plans in ergodic transport, we 
will consider, in section [31 an analogous problem in classical thermodynamical formalism: we 
know that the action of the equilibrium measure on test functions can be obtained by a limit 
procedure considering pre-images of points (via Ruelle operator - see section |2). However, this 
procedure is not efficient in computational terms, because the number of preimages growths 
exponentially (see section El). The method we propose, using Elton's theorem, is much more 
efficient because, using a Markovian stochastic process defined via an IFS, we get a law of large 
numbers that gives the integral of test functions as the limit of the mean of such functions 
evaluated in the outcomes of the process (see ©)■ 

2 Ergodic theorem in classical thermodynamical formal¬ 
ism 

In this section we remember the basic facts of classical thermodynamical formalism and then 
use Elton's ergodic theorem to characterize Gibbs measures for the shift on the Bernoulli set 
of symbols. The results of this section are of independent interest, and will not be used in the 
following sections. 
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Let ri = be the Bernoulli set on d symbols, where d G N, d > 2. We know 

(Tychonoff's theorem) that is a compact (metric) set. We will consider the sigma-algebra 
generated by the cylinders, which is the Borel sigma-algebra. The dynamics T here is given 
by the shift map on Q. 

The Ruelle-Perron-Frobenius (see [2 HI m) operator (also known as transfer operator) 
associated to a Lipschitz potential A : —>■ R is the operator that associates to any continuous 
function —>■ R the continuous function La{<p) given by 

La{(p){z) = 

T(w)=z 

A Lipschitz function A : 17 —> R is called a normalized potential if La( 1) = 1- It is known 
(see 0) that the RPF operator has a maximal eigenvalue Aa > 0 associated to an eigenfunction 
ip A, which is Lipschitz, simple and positive (a simple eigenvalue means an eigenvalue that has 
an associated eigenspace with dimension 1). If A is non-normalized then it can be normalized 
by considering 

A = A + \ogipA-\ogipAoT - log Aa. 

The dual RPF operator, denoted by L\, acts on probability measures on 17, and is defined 
by 

J = J LAi'4’)dfi. 

If A is the normalized potential associated to A, we know the dual operator L*^ preserves 
the simplex of probabilities and therefore has a unique fixed probability (ta (he. a probability 
PA that satisfies L*^{pa) = Pa), called the Gibbs state associated to A, which is invariant and 
ergodic for T, and satisfies 

/ AdpA + h{pA) = sup < / Adp + h{p) 

■J /i£A^T(S^) K'J 

(AIt(II) here means the set of invariant probabilities for T.) The right side of the equation 
above is called the pressure of A. We know that pA is the unique invariant measure to attain 
the maximum defining the pressure. The unique measure that attains such maximum is called 
the equilibrium measure for A, is ergodic, gives positive mass to open sets, and is given by the 
Gibbs measure pA ■ The last result is also known as the variational principle for pressure. 

Now an interesting question is: how can we characterize the Gibbs measure pA ? 

More precisely, is it possible to calculate J udpA for any test function u : 17 —R ? 

A partial answer is given by the fact (see 0) that, for any Lipschitz test function u : 17 —>■ R, 
we have 

when n ^ oo, where 

T'*(U))=Z 

where S'„(A) = X]fc=o AoT^ is the Birkhoff sum of order n of A. The convergence above is on 
the uniform convergence topology. 

However, this limit procedure above is not of practical use because it involves the evaluation 
of u in the inverse images of order n, with n —>■ oo, of some point of 17, and such calculation 
is not practical to be implemented: for the shift on d symbols, we have d" inverse images of 
order n. To get things even worse, we have to evaluate the Birkhoff sum of the potential A of 
order n in each one of the d" inverse images of a chosen point. 

Therefore, it is necessary another way for calculating f udpA- This can be accomplished 
by using Elton's theorem. In this way we get a much more efficient procedure, because what 
we will get is a Markovian stochastic process (that can be easily simulated in a Monte-Carlo 
process) and we will get a law of large numbers that will result in the integral of any test 
function, as shown in equation (|5|) . 

Suppose A : 17 —>• R is normalized, i. e. = 1 Vz € 17. 

Let Ti{z) = iz he the Elton maps, and the transition probabilities be given hj pi{z) = . 

The normalization hypothesis on A implies that '^Pi{z) = 1 for any z, and also the Lipschitz 
continuity of A implies that pi is Lipschitz. 
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Let fj. be the probability measure that satisfies © in Elton's theorem. Let B be any 
measurable set of 
Then 

= f = 

Ti{z)GB •' izeB 

= J^e^^^^hB{iz)dfi{z) = j CA{IB)d^x = C\^i{B), 

and therefore /i is the fixed point of Ruelle dual operator. 

As a result, Elton measure fi coincides with the Gibbs measure and we have 

Theorem 2 (Birkhoff-Elton Theorem in classical thermodynamical formalism) If we 

choose any zq G fl, and then, by recurrence, choose 

Zk+i = izk with probability Pi{zk) = 
then, for any continuous test function / : ^ R, 

1 

N 

with probability one. 


J 


3 Ergodic Transport 

Now we recall the main concepts of Ergodic Transport. See [B] for more details. 

Let X be a finite set and = {1,..., d}^ the Bernoulli space on d symbols. 

We denote the RPE operator associated to a Lipschitz cost (potential) c : X x —>• R as 

x^X (T{w)—y <T{w)—y x^X 

for any v G C(fl). We remark that this is the classical RPE operator associated to the potential 
^c(y) = log(Ex6X We know that Lc has a maximal eigenvalue Ac, which is simple and 

positive, and there is a positive eigenfunction he associated to Ac, see [8]. 

The RPE extended to the continuous functions X x is defined as 

Le(u){y)^Yl E 

x^X a'[w)—y 


for any u : X x fl —>■ R. 

Note that Lc sends u : X x fl —5> R to the function denoted by Lc{u) : fl —>■ R. 

Definition 1 We say that a Lipschitz cost (potential) c : X x fl —>■ R is normalized if for any 
y G LI, we have 

Y = 1 . 

x^X (T{w)—y 


If c is a Lipschitz cost that is not normalized, we associate to c the normalized cost 

c{x, y) = c(x, y) + \og{he{y)) - log(/ic o cr(2/)) - log(Ac), (5) 

were he is the positive eigenfunction associated to the maximal eigenvalue Ac of Lc. 

If c is normalized, we denote L* the operator on P{X x LI) defined by 


L*{TT){u{x,y)) = 


ixxn 


E E 

^aGX (T{w)—y 


i(a, w) 


The normalization property implies that L*(7r)(l) = 1, i.e. L* 
compact set P{X x li), and Tychonoff-Schauder theorem implies 


dTT{x,y). (6) 

preserves the convex and 


5 


Proposition 1 Given a normalized cost c there exists a unique fixed point for the operator 
L*. It will he denoted by tTc- 

Therefore, we have 


■nc{u{x,y)) = / y] y] e‘'^°’'^^u{a,w)\ dTrcix,y). (7) 

JXxQ \ Y / \ I 

\aGX a[w)—y J 

Definition 2 The fixed point, tTc, for L* is called the Gibbs plan for the normalized cost 
(potential) c. 

We denote by n(-,(T) the set of plans such that its y-marginal is cr-invariant, i.e., 


g{y)dTT{x,y) = 


g{a{y))dTT{x,y) for anyg e C'(n). 


tXxQ. 


tXxQ. 


( 8 ) 


By Theorem 4 of [B], we have that tt^ S n(-,(T) and the y-marginal of tTc is the Gibbs 
measure Vc to the classical RPF operator with the potential bc{y) = log(X]a;gv: 


We denote by [x,yi...yn] 
{w e n : tCl = 2/2, 
define 


{(a, w) G X X n : a = X, wi = 2/1, ■■■, Wn = 2 /n} and [2/2---2/n] = 
2/„}. Consider a fixed plan tt G n(-,tT) with //-marginal v and 




7r([x,2/i-.-2/n]) 

v{[y2...yn]) 


if 2/ = ( 2 / 2 ,2/3, ■•■) G supp(^). From the Increasing Martingale Theorem the functions J” 
converge to a function J^(x, y) in L^{X x Q, A{X x fi), tt) and for tt a.e. {x, y). For each plan 
TT this function J,r can be also obtained via the Radon-Nikodyn Theorem. We have, J,r > 0 
a.e. (tt) and = 1- 

We dehne the entropy of a plan tt as 


H{-k) 


- j log(J^)d7r. 


Definition 3 The pressure of a Lipschitz continuous cost (potential) c is defined by 


P{c) = sup ( / c c/tt-|-(tt) I . 

7rGn(-,cr) \Jxxfl / 

A plan TT G n(-,a) which realizes the supremum is called an equilibrium plan for c. 

Theorem 3 (Variational Principle over n(-,CT)) Let us fix a Lipschitz cost c. Then, P{c) = 
log(Ac), where Ac is the main eigenvalue of L^- The equilibrium plan for c is unique and given 
by the Gibbs plan for the normalized cost c := c + log(/ic) — log(hc o a) — log(Ac), where he is 
the positive eigenfunction associated to Ac- 


Now let us fix /i a probability on X. We define the /i-pressure of c by 

Pfj,{c) = sup / cdTT + H{Tr), (9) 

7rGn(/i,o-) JxxQ 

where n(/j,, a) is the set of all plans satisfying 

I for any/G C'(V), 

\ fxxn9{y)dx{x,y) = JxxQ9{cr{y))dTT{x,y) for anyg G C'(fi), 

which means, the set of probabilities tt such that the x-marginal of tt is the fixed probability 
p, G P{X) and the //-marginal of tt is cr-invariant. 

Note that P^(c) < P{c). 

By compactness, there exists a plan tTc G n(/r, ct) which attains the supremum at (|B|). 

We have the following duality result: 
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Theorem 4 Given a Lipschitz cost c, and let p, be a probability on X, we have 

inf / ipdu= sup / c{x,y) dn + Hin). (11) 

v>-.Pic-v)=oJx ^enMJxxQ 

The minimization is performed on the set of functions yi : X —^ R such that P(c — tp) = 0. The 
supremum at flip is attained in at least one plan, while the infimum is attained in exactly one 
function ip. 

Moreover, if : X —>■ K is the 

/ ip dp 
JX 

which implies 

sup 

iT-Gn(/i,CT) 

which means P^{c — (p) =0. 

Therefore P{c — ip) = P^{c — ip) = 0, and the maximizer of dill) is the equilibrium plan for 
c — ip, i.e. satisfies the non-constrained variational principle. We just proved the 

Corollary 1 Let : X —>■ M &e the unique minimizer for the Fenchel-Rockafellar duality (HH). 
The equilibrium plan TTc-p, for c — ip, belongs to Il{p,a) and is the unique maximizer of (fTT]) . 

As a final remark in this section, let us observe that we show, in [^, among other things, how 
equilibrium plans can be used to obtain the optimal transport plan, by a limit procedure where 
the cost c is multiplied by a constant /3 (the inverse of the temperature) and the parameter 
/3 —>■ oo. 


unique minimizer of (ED, then P{c — ip) =0 and also 


= sup / c{x, y) diT + H (tt) 

7r£ll{fL,cr) JxxQ 


/ {c{x,y) - (p)dTT + H{tt) =0, 

JxxQ 


4 An ergodic theorem in ergodic transport 

Now we discuss the main goal of this paper: the characterization, via Elton theorem, of the 
plan that satisfies the supremum 

sup / c{x,y) diT + H{Tr). 

7ren(/i,cr) JXxQ, 

where X = {1,2}, p is a probability measure onX,c:Xxn^]R depends on x € X and 
only on the two first coordinates of y e 11, and Il{p,a) is given by (ITU)) . Note that we are 
dealing with the constrained optimization problem, i.e. we search an optimising measure 
among those measures that project on a fixed probability measure p on X. Any such measure 
is given hy p = (j),l — p), where 0 < p < 1. 

By the corollary [TJ the plan is the equilibrium plan iTc-p, where p> is the minimizer of 
(ED- Note that this minimization is performed on the set of functions (/j : X —>■ R. such that 
P(c — v?) = 0. 

We will do such characterisation in three steps: 

Step 1 We need to find the minimizer ip = (pi,p 2 ) G that solves the minimization 
problem ED, i.e, that solves 

inf / pdp. 

ifi:P(c-ip)=0 JX 

Such minimizer exists and is unique (theorem |T]) . 

Step 2 If A = c — , we need to find the maximal eigenvalue Xa for the RPF operator 

La, as well as the associated eigenfunction La- Once we have that, we define the normalized 
potential A = A + log(/in) — log(/i^ o cr) — log(A^), which can be reduced to 

A = A + log(/lA) - log(/lA o cr). 
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as log(Ay 4 ) = P{A) = P{c — (fi) = 0. 

Step 3 We use Elton Theorem to characterize the constrained equilibrium measure tTc,^ 
showing how one can get the integral of any test function by means of a stochastic simulation 
(a Monte Carlo method). More precisely, for any continuous function / : X x —)• K., show 
that, almost certainly, 

fc =0 

for a well-choosen sequence {x^,y^), obtained by a probabilistic algorithm. 



Analysis of Step 3: 

Now we suppose Steps 1 and 2 were successfully performed (the analysis of these two first 
steps will be done after theorem O. We proceed to step 3: 

By Corollary [T] we know that is the equilibrium measure 'ka or tTc-^, and by theorem 
13] we have that 'ka or "Kc-ip is the Gibbs plan for A, i.e., 

L\{tta) = tta- 

Now we want to show that tta (i.e. tTc,^) satisfies the hypothesis of theorem [TJ 
Let us take, following the notation used in Elton’s Theorem [1] Z = X x 17, then we define 
for all (a,i) € A x {1, ...,d} and for all {x,y) S Z, the weight function 


Note that pa,i depends only on y. 

As A is normalized we have 

1 = X! X! pa4x,y) , y{x,y)GZ. 

a^X l<i<d a.^X,l<i<d 

For each a € A, 1 < i < d, we define the Elton map 

Ta,i{x,y) = [a,iy)- 

Finally, if i? is a borelian set oi Z = X x fl, and xb is the characteristic function of B. We 
have, by equation © that 

t^a{B) = I LAixB)dTTA = Y Pa,ii^^y)XB{a,iy)dTTA{x,y) 

d aex,l<i<d 

Y Pa,ii^>y)dT^A{x,y), 

Tc,i{x,y)&B 

which means that tta satisfies © and the Elton Theorem is true for the stochastic process 
generated by Pap and Therefore, we have proved the following theorem: 

Theorem 5 [Birkhoff-Elton Theorem in Ergodic Transport] Fix any (x°,j/®) G A x 17. Define 
by recurrence 

(a;^'*"^,= (a ,with probability \ 

where A is the normalized cost associated to A = c — (p, and p is the only minimizer of the 
Fenchel-Rockafellar duality equation CB). Then, for any continuous function / : A x 17 ^ R, 
we have, almost certainly, 

N-l 

/(a;'':/) ^ / fd-r^A, 

where tta = TTc^fi is the constrained maximizer of the duality equation (HB- 
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Analysis of Steps 1 and 2: 

We will consider here the case d = 2, i.e., = {1,2}^. As c depends on x and only on 

the two first coordinates of y, that is, c{x,y) = c{x, yi, 1 / 2 )) it is represented by the following 
matrices: 


= 


e 11 
e 21 



= 


e 11 

e 21 


f.2 

e 12 

f.2 

e 22 


where C?- = e°o = x,i,j = 1,2. 

Note that the operator Lc : (7(17) ^ (7(17), for any : 17 —>■ K., can be rewritten as 


Lc{v){y)= ^ ^ 

ie{l, 2 } x6X ie{l, 2 } 


( 12 ) 


where hc{y) = log Then 6c : 17 —t M depends on the two first coordinates of y 

and is represented by the matrix 

/ e=5i + e=?i e'=i2 + e=?2 

JD — I 1 2 1 2 

\ e 21 + e 21 e 22 + e^22 

It is a well known fact, see [5], that the positive eigenfunction, that we denote by Vc, associated 
to the main eigenvalue, Ac, of Lf. depends only on the first coordinate of j/, and this means 
that Vc is in fact a vector of K.^. 

More precisely, Vc is the left eigenvector of B associated to the eigenvalue Ac, because 



Lc{vc){y) 


Kvciy) 


VnB = XrVr 


(14) 


Let also 


0 

to 

II 


e'=i2 \ 

, (7^1 = ( 


e^i2 \ 



e ‘^22 j 


e°2i 

e°22 1 


The next Theorem shows how to perform the Step 1. 

Theorem 6 Let y = (j),l — p). Suppose c:Axl7—^-IRfsa Lipschitz cost that depends on x 
and just on the two first coordinates of y. 

(a) Then the unique minimizer (p of the left-hand side of the equation 


inf / (f{x)dp= sup / cdTT + H{tt) 
o-.p{c-ip)=o JX TT&n(u,a) JXxn 


(15) 


is given by 

{pi = -log( 2 ;i), 

((^2 = -log(2;2), 

where {zi,Z 2 ) belongs to the set of positive solutions of the following system of equations 


azf -\-bz 2 + CZ 1 Z 2 + d zi e Z 2 + i = 0, 

26pz| — 2a(l — p)zl + c{2p — l)ziZ 2 + epz 2 — d(l — p)zi = 0, 


(16) 


where a = det(7^, b = det(7^, c = det(7^^ + det(7^^, d = —trC^, and e = —trC"^. 

This means that (zi, Z 2 ) belongs to the set of intersection points of the two conics given by 
the equations above. Such intersections points are four, at most, and {zi,Z 2 ) is the one that 
minimizes —plog( 2 :i) — {1 — p) log( 2 ; 2 )- 

(b) Suppose also that and (7^ are stochastic matrices. Then the unique minimizer (p of 
the left-hand side of the equation (USD is given by 


Also we have 


Ti = -log(p), 

>P 2 = -log(l -p). 


sup 

7ren(/i,{7) 



cdlT -\- H{7 t) 


-p\og{p) - (1 -p) log(l -p). 
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Proof: First we need to understand what means the restriction, P(c — 93 ) = 0, in left-hand 
side of equation m- By Theorem [3] P(c — (p) = log(Ac-(p), where Xc-ip is the main eigenvalue 
of Lc-ip. Therefore P(c — </?)= 0 means Xc-ip = 1- As : X — 5 > R and X = {1,2}, (p can be 
seen as two-dimensional vector p = [pi, p 2 ). Then we have 




where y = (j, 7 / 2 , 2 / 3 , (O, the operator Lc-p is represented by the 2x2 matrix 



(17) 


finally using dm) . P{c — p) = Q means Xc-^p = 1 is the dominant eigenvalue of B. 

If we apply the change of coordinates zi = 6““^% Z 2 = Then P{c — p) = 0 if and only 

if (zi,Z 2 ) satisfy 

(i) zi > 0,Z2 > 0, 

(ii) 



(iii) the other eigenvalue of the matrix B given in (HZl) is less than 1. 
Note that we want to find the unique minimizer of 



but instead of that we will minimize p{x) dp, subject to the restriction that (zi, Z 2 ) satisfies 
condition (iii}]- After we get all the solutions of this problem we will test which of them also 
satisfies conditions (i) and (iii). 


Condition (ii) is equivalent to 





:= a -I- 6 z| -I- c Z 1 Z 2 + d -Zi + e Z 2 + 1 = 0. 


If we denote by 

g(zi, Z2) = a z{ -I- 5 Z2 -I- CZ1Z2 -I- dzi -I- e Z2 -I- I 
then condition (ii) describes an algebraic curve (a conic) on given by 

5(^1,2:2) = 0. 


Therefore we need to minimize 



/(zi,Z2):= / p{x)dp = pip +p2{l - p) =-plogzi - {1 - p)logZ2, 


subject to the restriction g{zi, Z 2 ) = 0. To do that we will use the Lagrange multiplier theorem: 
if (zi,Z 2 ) is a solution to the constrained minimization problem above, there exists a A G R 
such that 


V/(zi,Z2) = XXg{zi,Z2). 


Using that 



^see remark after Theorem [ 6 ] 
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we need to solve the following equations 


iL 

-Zl 


X(2azi + CZ 2 + d) and 


1 -P 
^^2 


X{ 2 bz 2 + cz\ + e), 


or 


or 


_ P _ ^ 1 -P 

zi{2azi + CZ2 + d) Z2{2bz2 + czi + e)' 


p{2bzl + CZ 1 Z 2 + ez 2 ) = (1 — p){‘^azf + CZ 2 Z 1 + dzi), 
which are equivalent to 

2 bpz 2 — 2a{l — p)zi + c{2p — l)ziZ 2 + epz 2 — d(l — p)zi = 0 


(18) 


which, together with equation g = 0 give the system of equations (HU). 

In order to conclude the proof of item (a) of Theorem [B1 we remark that the unique mini¬ 
mize!' of inf / (p(x) dp belongs to the intersection of the two conics in (1161) . which are, 

<p:P{c-ip)=0 JX 

at most, four points. For each of this solutions we test if it satisfies conditions (i), (that is, 
the solution have to be in the positive quadrant) and (iii). If there are more than one solution 
that satisfies conditions (i)-(iii), then we test which one is the minimizer. 

Now, in order to prove item (b) of Theorem[ 6 j let us suppose that and are stochastic 
matrices, i.e. 


II 

0 



, = ( 





l-e“i 2 j 



1 — e 12 j 


We have a = — e'^ 12 , b = e'^n — e'^ 12 , c = a -\- b, d = —{a -|- 1), e = —{b + 1), 

and equation g = 0 becomes 


a z^ + b Z 2 + (a + b) Z 1 Z 2 - (a + 1) zi - (b + 1) Z 2 + 1 = 0 


(19) 


( 20 ) 


while equation (Ha becomes 

2 bpz 2 - 2a(l - p)z^ -I- (a -I- b){2p - l)ziZ 2 - {b + l)pz 2 -f (a -I- 1)(1 - p)zi = 0 (21) 

First we solve equation dsni) in terms of Z 2 - We get, if o 7 ^ 0, two solutions zi = and 

zi = 1 — Z 2 j and if a = 0 we get only the solution zi = 1 — Z 2 . 

Now, we solve (EH): 

Case 1: if zi = becomes 


2bpz2 — 2a(l — p) ( --I- (a -I- b){2p — 1)- —Z 2 — {b + l)pz 2 + 

\ a J a 

-|-(a -I- 1)(1 

This equation can be rewritten as 


a 


4 + 


2b — bp 


+ p-l-b]z 2 + 


p-1 


+ (l-p) = 0 


or 


{ba — b'^)z 2 + {2b — bp + pa — a — ba)z 2 + p—1 + a— pa = 0 


finally, this is equivalent to 


(^a{z2 — 1) — bz2 + 1^ {bz2 -I- p — 1^ =0, 


which have two solutions Z 2 = and Z 2 = 44' 

Then, we get two possible pair of solutions {zi,Z 2 ) for the 
if 22 = Bi’ tfien ^ ^ and if ^2 = then = 

Case 2: zi = 1 — Z 2 , then equation (EH becomes 


system given by EH and EH: 

l-bzQ, _ p 
a a' 
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2bpz2 - 2 a(l -p)(l - 2z2 + z|) + (a + b){ 2 p - l)(z2 - -zf) - {b + l)pz2+ 

+ {a + 1)(1 -_p)(l - Z 2 ) = 0 

which can be rewritten as 

(^ — a + b'jz^ + (^2a — ap + bp — b — Z 2 + {ap — a — p + =0. 

Finally, this is equivalent to 

{p + Z2 — (^a{z2 — 1 ) - 6 z 2 + =0 

which have two solutions Z 2 = ^5^ and Z 2 = 1 — p. 

Then, we get another two possible pair of solutions (zi, Z 2 ): 

If ^2 = , then zi = 1 — Z 2 = and if Z 2 = 1 — p, then zi = 1 — Z 2 = P- 

Collecting all the solutions of cases 1 and 2, we have 3 different solutions (zi,Z 2 ) for the 
system given by (EOl) and m 


/1 — b a — 1 
\a — b^ a — b 


p 1 -p \ 

a' b ) 


(P, 1 -P)- 


Now we test conditions (i) and (iii) for each one of the three possible solutions above: 
Claim: j ‘lo not satisfy item (i), and (^, do not satisfy item (i) or item 

(iii)- 

In fact, using that and are given by (fT^ . we have that 0 < < 1, o = e'^n — e ‘^12 

and b = and this implies — 1 < a < 1 and — 1 < 6 < 1, and as a consequence 

a — 1 < 0 and 6—1 < 0. Then, ii a ^ b, we have that Zi = and Z 2 = fE^ have opposite 
signs, and this not satisfy item (i). 

Let us now analyze the case zi = ^, Z 2 = We will show that, either this solution does 
not satisfy item (i), or the dominant eigenvalue of the matrix in (HZl is greater than 1, and 
therefore do not satisfy item (iii). In fact, the matrix in (ITTl) is 


p / 


gCi2 

1 1 1 


6^12 \ 

e'^ii — e°i2 \ 

1 — e°ii 

I — e '^12 

I e'^ii - e'=i2 \ 


I — e'^12 1 


This matrix have eigenvalues given by Ai = 1 and A 2 = ——i—I - 2 ^ ^ i = - + I-r^- If 

this solution satisfy item (i), then 0 < =-^1 = 1 ^^*1 ^ = Z 2 = and then we 

have 0 < a < 1 and 0 < 6 < 1. This implies ^ >p+l—p = 1, (with equality only if 

0 = 6=1, i.e., zi = p and Z 2 = 1 — p)- In the case we have strict inequality, A 2 > 1 will be the 
dominant eigenvalue. 

We conclude that (p, 1 — p) is the only critical point that satisfy items (i), (ii), and (iii). 
Hence, as we know that there exist a solution to 

inf / (p{x) dp 
vP{c-v)=o J X 

it must be given by (pi, (^ 2 ) = (— log(p), — log(l — p)), which finishes the proof of item b) of 
Theorem [6l ■ 

Remark: Note that the original problem is to minimize ip{x) dp = <pip + <P 2 (I ~ p) 
subject to the restriction {tp : P{c — p) = 0}. We know by [B], that exists a unique minimizer 
to this problem, that we denote by tp = ((pi,(p 2 ), and that satisfies P{c — (p) = 0. Hence we 
get that the matrix (see equation (II3) has 1 as dominant eigenvalue 

and this eigenvalue is simple. 

Instead of finding directly the minimizer of the original problem, we solved a second prob¬ 
lem, which is to minimize f(zi,Z 2 ) = —plogzi — (1 —p)logZ 2 , subject to g{zi,Z 2 ) — 0, where 
zi = e~‘f’^,Z2 = 

Claim: The solution of the original problem z = (zj, Z 2 ) = e~^^) can be found in the 

set of extremal points of the second problem, that we determine using Lagrange multipliers. 
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Proof of the claim: 

The conic g{z) = 0 is given by points whose associated matrices have 1 as one of its 
eigenvalues. Such conic can have two subsets: The first of then given by matrices having 1 as 
maximal eigenvalue (call Ci this subset) and other (call it C 2 ) having 1 as minimal eigenvalue. 
(Cl and C 2 can intersect in points whose associated matrices have both eigenvalues given by 
1.) Let us show that z is the interior of Ci. In fact, first note that is stricly positive, which 
implies the dominant eigenvalue of the matrix = ziC^ + Z 2 C^ is simple, and therefore the 
other eigenvalue is smaller than 1. Now, using the fact that the spectrum of B^ = ziC^ +Z 2 C^ 
varies continuously in the parameter z, we know that there exists a neighbourhood I 4 of z in 
the curve g{z) = 0 such that the matrix B^ has 1 as dominant and simple eigenvalue for any 
z G Vi- This implies that z is the interior of Ci. Finally, as f{z) < f{z) for all z G (because 
z is a minimum point of the original problem), this implies z is a local minimum of the second 
problem and can be found using Lagrange multipliers. End of Proof of Claim. 

Remark on the claim: Note that the level curves of the function /(zi,Z 2 ) have a simple 
geometry: Z 2 is, in fact, proportional to a negative power of zi, in any level curve. Also, 
< 7 ( 21 , Z 2 ) = 0 determines a conic, which also has a simple geometry, and this prevent any 
pathologies that could invalidate the argument above. 


Now we want to perform Step 2. 

The normalized cost associated to A = c — (p is 

A = A + log(/iA) - log{hA 0 0-)- log(AA), 

where hA is the eigenfunction of La = Lc-f, associated to the maximal eigenvalue = Ac-c^. 
We see in the proof of the theorem [5] that Xc -0 = 1. And using (fHl) we get that hA = Vc-(p is 
the left eigenvector of B associated to the eigenvalue 1, where 

^ _ f e~'^^ + e'^11 e‘^12 ^-vi _|_ gC ?2 ^-^2 

i.e., hAB = hA, where (zi,Z 2 ) = (e“‘^L 


^ =ZiC^+Z2C\ 


Remark: If we are in case b) of TheoremlHl then the cost A = c — ip is already normalized. 
In fact, if we denote by p(l) = p = e~^^ and p{2) = 1 — p = using (fTOl) we have 


E E 

xGX (T(w)—y 

for all y G LI. 


^c{x,w)-ip{x) _ 


= E E = E 


xeX 


xex 


„c{x,l,yi) 


+ l-e' 


c{x,l,yi) 


= I, 


Example: Let us suppose that the cost c{x, y) = c{x, yi, 2 / 2 ) is represented by the following 
matrices 


= 


3 5 
2 4 


= 


2 1 
4 3 


In order to perform Step 1 we look for positive solutions of dm), and we get that the unique 
positive solution is (zi,Z 2 ) = (0.101972,0.0568922). 

Now in order to perform Step 2 we need first to calculate the left eigenvector, associated 
to the eigenvalue 1, of the matrix B = ziC^ + Z 2 C'^. 

We have that 

0.4197 0.566751 

0.431512 0.578563 


B = 


and we can get that hA = (0.596709, 0.802458) is such that hAB = hA- 
Then the matrices 


= 


0.3059 0.379132 

0.274264 0.407887 


= 


0.113784 0.0423052 
0.306036 0.170677 


which are obtain by the expression 

pyx _ f^x (^) 

hA{j) 


— gC(x,i,j)-(f!{x)+los{hA{i))-'^og(hAU)) 
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represent the normalized cost A = A + log(/iA) — log(/i^ o a). 

In fact, by the equation (IT^ . we have that Cf^ = as {z\,zi) = 

we have z^C^^ = ^ and finally if y = (q j, 1 / 3 , j/ 4 , ■•■•) then a{y) = (j, 1 / 3 , ^ 4 , ....)■ 

Therefore 

QX _ gC(K,ij)-0(a:)+log(?iA(y))-log(/i^ocr(y)) _ ^A{x,y) 

tj 
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